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Abstract 

This  paper  presents  the  first  sublinear-time  deterministic  parallel  algorithms  for  bipar¬ 
tite  matching  and  several  related  problems,  including  maximal  node-disjoint  paths,  depth-first 
search,  and  flows  in  zero-one  networks.  Our  results  are  based  on  a  better  understanding  of  the 
combinatorial  structure  of  the  above  problems,  which  leads  to  new  algorithmic  techniques.  In 
particular,  we  show  how  to  use  maximal  matching  to  extend,  in  parallel,  a  current  set  of  node- 
disjoint  paths  and  how  to  take  advantage  of  the  parallelism  that  arises  when  a  large  number  of 
nodes  are  “active”  during  an  execution  of  a  push/relabel  network  flow  algorithm. 

We  also  show  how  to  apply  our  techniques  to  design  parallel  algorithms  for  the  weighted 
versions  of  the  above  problems.  In  particular,  we  present  sublinear-time  deterministic  parallel 
algorithms  for  finding  a  minimum- weight  bipartite  matching  and  for  finding  a  minimum-cost 
flow  in  a  network  w:th  zei  >-one  cap  icitio  if  the  weights  are  polynomially  bounded  integers. 

Keyword s:  Bipartite  mat.<  iiing,  assignment,  pioblem  network  flow,  parallel  algorithms. 

1  Introduction 

Bipartite  matching  and  related  problems  are  well  studied  in  the  contexts  of  both  sequential  (see 
e.g.,  [ET75,HK73,Tar72])  and  parallel  (see  e.g.  [AA87,KUW86,MVV87])  computation.  Though 
the  latter  research  produced  RNC  algorithms,  no  sublinear-time  deterministic  parallel  algorithms 
were  known.  In  this  paper  we  describe  a  number  of  techniques  that  allow  us  to  construct  such 
algorithms  for  bipartite  matching,  flows  in  zero-one  capacity  networks,  depth-first  search,  and  the 
problem  of  finding  a  maximal  set  of  node-disjoint  paths. 

Our  algorithms  for  bipartite  matching  and  for  zero-one  flows  generalize  to  weighted  versions  of 
these  problems.  These  generalizations  involve  scaling,  so  the  resulting  algorithms  run  in  sublinear 
time  if  the  weights  are  polynomially  bounded.  The  assignment  problem  with  unary  weights  is  known 
to  be  in  RNC  [KUW86],  but  no  sublinear  deterministic  algorithms  have  been  known  previously. 

Our  results  are  based  on  a  better  understanding  of  the  combinatorial  structure  of  the  above 
problems,  which  leads  to  new  algorithmic  techniques.  In  particular,  we  show  how  to  use  maximal 
matching  to  extend,  in  parallel,  a  current  set  of  node-disjoint  paths.  We  also  show  how  to  take 
advantage  of  the  parallelism  that  arises  when  a  large  number  of  nodes  are  “active”  during  an 
execution  of  a  push/relabel  network  flow  algorithm. 

To  address  the  above  problems,  we  need  the  following  notation  and  definitions.  Given  a  graph 
G  =  (V,  E),  let  n  denote  the  number  of  nodes  in  the  graph  and  let  m  denote  the  number  of  edges 
(or  arcs,  if  the  graph  is  directed).  Throughout  the  paper  we  shall  deal  only  with  simple  paths,  and 
a  path  will  always  mean  a  simple  path.  The  length  of  a  path  is  defined  as  follows.  If  there  is  a 
length  associated  with  each  edge,  then  the  length  of  the  path  is  the  sum  of  the  lengths  of  the  edges 
on  the  path.  Otherwise,  the  length  of  the  path  is  the  number  of  edges  on  the  path.  A  matching  is 
a  set  of  edges  such  that  each  node  in  the  graph  has  at  most  one  edge  in  the  matching  incident  to 
it.  A  perfect  matching  is  a  matching  such  that  each  node  in  the  graph  has  exactly  one  edge  in  the 
matching  incident  to  it,  and  a  maximal  matching  is  a  matching  such  that  there  is  no  edge  between 
any  two  unmatched  nodes.  The  weight  of  a  matching  is  the  sum  of  the  weights  of  the  edges  in  the 


-1st 

ofj  ILL 


Spc oial 


B 


matching. 

Our  model  of  parallel  computation  is  CItCVV  l’RAM  (concurrent  read  concurrent  write  parallel 
random  access  machine)  [FW78].  Given  a  directed  graph  with  n  nodes  and  m  arcs,  BFS(n.  in) 
and  SSP(n,m)  denote  the  maximum  of  n  4-  m  and  the  number  of  processors  required  to  find  a 
breadth-first  search  tree  and  a  single-source  shortest-path  tree  in  O((logn)2)  time,  respectively.  It 
is  known  that  SSP(n,m)  <  n3,  and  that  BFS(n,m)  is  at  most  the  number  of  processors  required 
to  multiply  two  n  x  n  matrices  in  O(logn)  time,  which  is  0(n25)  [PR85]. 

In  this  paper  we  address  the  following  problems. 

Maximal  node-disjoint  paths  We  are  given  a  graph  G  =  (V,  E)  with  a  set  of  sources  S  C  V 
and  a  set  of  sinks  T  C  V,  such  that  Sf)T  —  0.  A  set  II  of  node-disjoint  paths  is  said  to  be 
maximal  if  each  path  in  II  starts  at  a  distinct  source  and  terminates  at  a  distinct  sink,  and  there 
is  no  path  from  a  source  to  a  sink  in  the  graph  G  —  II.  The  maximal  node-disjoint  paths  problem 
is  to  find  a  maximal  set  of  node-disjoint  paths  from  S  to  T.  We  give  an  algorithm  for  the  maximal 
node-disjoint  paths  problem  which  runs  in  0(,/n(!ogn):t)  time,  both  on  directed  and  on  undirected 
graphs.  On  undirected  graphs  our  algorithm  uses  0(n  -f  m)  processors,  and  on  directed  graphs 
it  uses  BFS(n,m)  processors.  The  algorithm  also  solves  a  slight  generalization  of  the  maximal 
node-disjoint  paths  problem;  this  generalization  is  useful  for  constructing  a  depth-first  search  tree 
in  an  undirected  graph. 

Depth-first  search  in  undirected  graphs  Given  an  undirected  graph  G  =  ( V ,  E)  and  a  dis¬ 
tinguished  node,  construct  a  depth-first  search  tree  of  the  graph  rooted  at  this  node.  A  tree  T  is  a 
depth-first  search  tree  iff  for  all  non-tree  edges  («,  v),  u  and  v  lie  on  the  same  path  starting  at  the 
root  of  the  tree.  All  previous  parallel  algorithms  for  the  problem  use  randomization  [And87,AA87]. 
We  show  how  to  use  our  techniques  to  construct  a  deterministic  algorithm  for  finding  a  depth-first 
search  tree  in  0(\/n(logn)5)  time  using  0(n  +  to)  processors. 

Bipartite  matching  Given  an  undirected  bipartite  graph  G  =  (S,7\  E ),  where  5(JT  is  the  set 
of  nodes  (5’f')X'  =  0)  and  E  c  (S  X  T)  is  the  set  of  edges,  find  a  maximum  cardinality  matching  in 
G.  We  present  an  algorithm  for  the  bipartite  matching  problem  which  runs  in  O(n2^3(logn)3)  time 
using  BFS(n,  to)  processors.  Although  the  problem  is  known  to  be  in  RNC  [KUW86,MVV87],  the 
fastest  previously  known  deterministic  algorithm  [GT88]  runs  in  0(n(logn)2)  time. 

Assignment  problem  (Also  known  as  weighted  bipartite  matching  problem.)  Given  a  weighted 
undirected  bipartite  graph  G  -  ( S,  7',  E),  find  a  minimum  weight  perfect  matching.  We  present  an 
algorithm  for  the  assignment  problem  that  uses  S  S  P(n,m)  processors,  and  runs  in 
0(n2/,3(logn)3(lognC))  time  if  the  edge  weights  are  integers  in  the  range  [-C,  C).  Under  the  as- 


i 


§ 

f 

[ 


% 

i 


•* 

t. 


a 

» 

:8 


i’-v.sv.uy 


sumption  that  edge  weights  are  given  in  unary,  this  problem  is  known  to  be  in  RNC  [KUW86,MVV87]; 
our  algorithm  is  sublinear  under  this  assumption.  The  fastest  previously  known  deterministic  al¬ 
gorithm  [GT88]  runs  in  0(n(logn)3log(nC))  time. 

Flows  in  zero-one  networks  We  also  study  flows  in  networks  with  unit  capacities.  We  study 
two  versions  of  the  problem,  the  maximum  flow  problem  and  the  minimum-cost  flow  problem. 
Note  that  the  bipartite  matching  problem  is  a  special  case  of  the  first  problem,  and  the  assignment 
problem  is  a  special  case  of  the  second  problem.  Our  algorithm  for  the  first  problem  runs  in 
O(m2/3logn)  time  using  BFS(n,m)  processors.  We  also  show  that  if  the  network  has  no  multiple 
arcs,  the  algorithm  can  be  modified  to  run  in  O((nm)2/5logn)  time.  Our  algorithm  for  the  second 
problem  runs  in  0(m2/3(logn)2log(nC))  time  using  SSP(n,m)  processors. 


The  problems  discussed  above  are  important  tools  for  design  of  efficient  algorithms.  For  exam¬ 
ple,  a  linear- time  (sequential)  depth-first  search  algorithm  leads  to  linear-time  algorithms  for  many 
other  problems.  NC  algorithms  for  the  above  problems  would  result  in  NC  algorithms  for  many 
other  problems  as  well.  Our  results  are  a  step  towards  the  design  of  efficient  parallel  algorithms  for 
these  problems.  The  ideas  of  this  paper  may  lead  to  improved  sequential  algorithms  as  well. 

The  paper  is  organized  as  follows.  In  Section  2  we  describe  a  parallel  algorithm  for  the  maximal 
node-disjoint  paths  problem  and  show  how  to  apply  it  to  depth-first  search  in  undirected  graphs. 
In  Section  3  we  give  parallel  algorithms  for  maximum  matching  and  zero-one  flow  problems;  in 
Section  1  we  extend  the  results  to  the  weighted  versions  of  these  problems. 


2  Maximal  Node-Disjoint  Paths 

In  this  section  we  present  an  efficient  parallel  algorithm  that  finds  a  maximal  6et  of  node-disjoint 
paths  from  a  set  of  sources  to  a  set  of  sinks  in  a  directed  or  undirected  graph.  We  describe  the 
variation  of  the  algorithm  that  works  for  undirected  graphs.  The  extension  to  the  directed  case  is 
straight-forward. 

A  natural  approach  to  solve  this  problem  is  to  find  the  paths  one-by-one.  The  problem  with  this 
approach  is  a  potentially  large  (ft(n))  number  of  paths,  which  leads  to  a  superlinear  running  time. 
Another  approach  is  to  maintain  the  current  set  of  paths,  extending  as  many  paths  as  possible  at 
each  iteration.  This  approach  has  two  problems.  First,  it  takes  time  that  is  proportional  to  the 
length  of  the  longest  path,  and  therefore  it  is  slow  if  the  paths  are  long.  Second,  it  may  not  be 
possible  to  extend  a  large  number  of  paths  at  each  iteration  because  of  the  interaction  among  the 
paths.  Our  algorithm  combines  these  two  approaches. 

The  algorithm  Maximal- Paths  .sol ve.s  a  . slight  generalization  of  the  node-disjoint  paths  problem: 
(liven  a  set  of  node-disjoint  paths  connecting  sources  to  intermediate  nodes,  find  a  set  of  node- 


disjoint  paths  from  the  sources  to  the  sinks  such  that  for  any  node  that  is  on  an  input  path  but  on 
no  output  path,  every  path  from  this  node  to  a  sink  intersects  an  output  path.  The  node-disjoint 
paths  problem  corresponds  to  the  case  when  each  one  of  the  input  paths  is  a  single  node.  The 
generalization  of  the  problem  is  required  for  the  depth-first  search  algorithm  described  below. 

Figure  1  describes  the  Maximal-Paths  algorithm.  The  algorithm  maintains  two  sets  of  node- 
disjoint  paths:  Active  paths  and  Dead  paths.  An  active  path  starts  at  a  source  and  ends  at  some 
intermediate  node  which  is  not  a  sink;  a  dead  path  connects  a  source  to  a  sink.  The  initial  set  of 
active  paths  is  the  set  of  the  input  paths.  The  nodes  are  divided  into  idle,  active,  and  dead,  denoted 
by  Vj,Va,  and  Vj,  respectively.  A  node  is  active  if  it  belongs  to  a  path,  dead  if  it  was  active  during 
the  algorithm  but  currently  does  not  belong  to  any  path,  and  idle  otherwise.  Intuitively,  a  node 
becomes  dead  if  the  current  set  of  active  paths  can  be  extended  to  a  maximal  set  of  node-disjoint 
paths  without  using  this  node.  Initially,  Vd  is  empty  and  Vj  is  the  set  of  nodes  not  on  any  input 
path. 

The  algorithm  consists  of  two  stages.  The  first  stage  proceeds  in  iterations,  where  at  each 
iteration  the  algorithm  extends  some  of  the  active  paths  by  idle  nodes,  changing  the  status  of  these 
nodes  to  active.  The  algorithm  “clips”  the  other  active  paths,  i.e.,  removes  end-point  nodes  from 
these  paths,  and  changes  the  status  of  ihe  removi'd  nodes  to  dead.  Let  H  be  the  set  of  nodes  that 
are  the  end-points  of  the  active  paths,  and  H'  bo  the  set  of  idle  nodes  that  are  neighbors  of  nodes 
in  H .  First,  the  algorithm  finds  a  maximal  matching  in  the  bipartite  graph  induced  by  the  set  of 
edges  ( H  x  H ')  n  E,  where  E  is  the  set  of  edges  in  the  input  graph.  If  a  node  v  £  II  is  matched 
to  v'  (E  H',  then  the  path  associated  with  v  is  extended,  and  v'  becomes  the  new  end-point  of  the 
path,  changing  the  status  to  active.  If  v'  is  one  of  the  sinks,  this  path  changes  its  status  to  dead. 
If  a  node  v  G  H  is  not  matched,  the  path  associated  with  v  is  “clipped”,  the  node  previous  to  v  on 
this  path  becomes  the  new  end-point  of  the  path,  and  the  status  of  v  changes  to  dead.  This  stage 
continues  as  long  as  the  number  of  active  paths  is  at  least  >/n. 

During  the  second  stage,  the  algorithm  extends  the  active  paths  to  sources  one-by-one.  To 
extend  a  path  P  =  (vi,v2 the  algorithm  first  computes  connected  components  in  the 
graph  induced  by  edges  in  (V/  x  Vj)  U  (Va  X  V/).  Let  vr  be  the  node  on  P,  such  there  exists  a  path 
P'  from  vr  to  some  sink  t  over  idle  nodes,  and  for  all  i,  r  <  i  <  k,  no  idle  sink  is  reachable  from  w, 
by  a  path  that  consists  of  idle  nodes  only.  Then  the  algorithm  clips  P,  changes  the  status  of  the 
nodes  {u,  :  r  <  i  <  fc}  to  dead,  changes  the  status  of  nodes  on  P'  to  active,  and  extends  the  path 
(vi,v2, . . .,  ur)  by  attaching  it  to  P' . 

The  following  lemma  is  sufficient  to  show  correctness  of  the  algorithm. 

Lemma  2.1  At  any  moment  during  an  execution  of  the  algorithm,  there  is  no  path  from  a  dead 
node  to  an  idle  sink  such  that  all  the  nodes  on  this  path  are  either  dead  or  idle. 

Proof:  Consider  an  iteration  of  the  first  stage  in  which  a  node  v  becomes  dead.  By  construction, 
v  becomes  dead  only  if  it  was  not  matched  during  computation  of  the  maximal  matching.  This 
means  that  at  the  end  of  this  iteration,  v  does  not  have  any  idle  neighbors.  On  the  other  hand,  if 


procedure  Maximal-Paths(V,  E ,Pa)i 
Va  ~  the  set  of  active  paths; 

Vd  ~  the  set  of  dead  paths,  connecting  a  sink  to  a  source; 

Vi  -  the  set  of  idle  nodes; 

Va  ~  the  set  of  active  nodes; 

Vd  -  the  set  of  dead  nodes; 

T  -  the  set  of  sink  nodes; 

{The  first  stage} 

Va  :=  nodes  on  paths  in  Va\ 

V,  :=  V  -  Va; 

:=  0; 

while  [Pa\  >  y/n  do  begin 

H  :=  set  of  end-points  of  paths  in  V; 

H'  :=  {t/  :  v'  6  Vt,3v  €  H  s.t.  (n,*/)  6  E }; 

M  :=  maximal  matching  on  ( H  x  //')  fl  E; 
for  all  (v,  v')  £  M  do  begin 

extend  the  path  corresponding  to  v  with  t>'; 

Vi  :=  Vr  -  v'i 
Va  :=  Va  +  u'; 

if  v'  €  T,  remove  this  path  from  the  set  of  active  paths  Va\ 
end; 

for  all  v  G  H  not  matched  by  M  do  begin 
remove  v  from  its  path; 

if  no  nodes  left  on  this  path,  remove  it  from  VaX 

Va  :=  Va  -  v ; 

Vd  :=  Vd  +  v ; 
end; 
end; 

{The  second  stage  -  number  of  active  paths  is  below  >/n} 

for  all  P  €  Pa  do  begin 

E'  =  ((Vi  x  V»U(14  x  V,))nE‘, 

vr  :=  the  node  closest  to  the  end  of  P  from  which  a  sink  is  reachable  via  edges  in  E'\ 
remove  nodes  that  follow  vr  from  P,  extend  P  to  a  sink; 
end; 

end.  _ _ 

Figure  1:  The  Maximal- Paths  procedure 
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a  node  changed  its  status  to  dead  during  the  second  stage,  then,  by  construction,  there  is  no  path 
consisting  of  idle  nodes  only  from  this  node  to  an  idle  sink.  Furthermore,  a  node  cannot  change  its 
status  to  idle  from  any  other  status,  and  hence  each  path  from  a  dead  node  to  an  idle  sink  must 
pass  through  an  active  node.  | 

Each  iteration  of  the  second  phase  of  the  algorithm  is  essentially  a  connectivity  computation, 
which  can  be  computed  in  O(logn)  time  and  O(m)  processors  for  the  undirected  case  [SV82b], 
and  O(log2n)  time  and  BFS(n,m)  processors  for  the  directed  case  [PR.85].  The  following  lemma 
bounds  the  number  of  iterations  in  the  first  stage. 

Lemma  2.2  There  are  at  most  0(y/n)  iterations  in  the  first  stage. 

Proof:  The  main  idea  of  the  proof  is  that  nodes  can  change  status  only  “in  one  direction”,  and 
that  at  each  iteration  a  large  number  of  nodes  change  status.  Define  a  potential  function 

*  =  |VUJ  +  2|V7|. 

An  extension  of  a  path  by  one  node  changes  the  status  of  this  node  from  idle  to  active  and  reduces 
$  by  one.  On  the  other  hand,  when  a  path  is  clipped,  its  old  end-point  changes  the  status  from 
active  to  dead,  again  reducing  $  by  one.  At  each  iteration  of  the  first  stage  there  are  at  least  y/n 
active  paths.  At  the  end  of  an  iteration  »acb  one  of  these  paths  is  either  extended  or  clipped,  which 
causes  a  total  reduction  of  at  least  y/n  in  $.  The  claim  follows,  because  $  <  3 n.  | 

Each  iteration  of  the  first  stage  can  be  implemented  in  O(log3  n)  time  and  with  O(m)  processors, 
using  the  maximal  matching  algorithm  of  [IS86],  This  leads  to  the  following  theorem. 

Theorem  2.3 


1.  On  undirected  graphs,  the  Maximal-Paths  algorithm  runs  in  0(y/n  log3  n)  time  using  0(n+m) 
processors. 

2.  On  directed  graphs,  the  Maximal-Paths  algorithm  runs  in  O(y/niog3  n)  time  using 
BFS(n,m)  processors. 

Observe  that  a  maximal  set  of  node-disjoint  paths  corresponds  to  a  blocking  flow  in  matching 
networks  (described  in  detail  in  the  next  section).  Thus,  by  using  the  Maximal-Paths  procedure 
to  find  blocking  flow  at  each  iteration  of  Dime's  maximum-flow  algorithm  [Din70,ET75],  we  can 
compute  maximum  bipartite  matching  in  sublinear  time.  In  the  subsequent  sections  we  will  show 
more  efficient  algorithms  for  bipartite  matching  and  related  problems;  these  algorithms  do  not  use 
the  Maximal-Paths  algorithm. 


Depth-First  Search  Another  application  of  the  Maximal-Paths  algorithm  is  for  constructing  a 
deterministic  sublinear-time  algorithm  for  finding  a  depth-first  search  tree  in  an  undirected  graph. 
The  problem  of  finding  such  a  tree  has  been  studied  before  [Smi86,GB84],  and  recently  Aggarwal 
and  Anderson  have  found  a  randomized  NC  algorithm  for  it  [AA87].  However,  no  deterministic 
parallel  algorithm  for  the  problem  with  a  sublinear  running  time  was  known  previously. 

Although  Aggarwal- Anderson  algorithm  is  randomized,  the  randomization  is  used  only  in  order 
to  compute  a  maximum  set  of  node-disjoint  paths  with  the  minimum  weight.  Aggarwal  and  An¬ 
derson  reduce  this  problem  to  the  problem  of  finding  a  maximum  matching.  Careful  examination 
of  their  proofs  shows  that  instead  of  a  maximum  set  of  paths,  it  is  sufficient  to  be  able  to  find  a 
maximal  set  of  paths.  More  precisely,  it  is  sufficient  to  have  an  algorithm  that  solves  exactly  the 
generalization  of  the  node-disjoint  path  problem  that  is  solved  by  the  Maximal-Paths  algorithm. 
Therefore,  we  have  the  following  theorem. 

Theorem  2.4  A  depth-first  search  tree  in  an  undirected  graph  can  be  found  in  O ( \fn  log5  n)  time 
using  0(n  +  m)  processors. 

3  Bipartite  Matching  and  Zero-One  Flows 

In  this  section  we  describe  sublinear-time  parallel  algorithms  for  for  the  bipartite  matching  problem 
and  for  the  zero-one  network  flow  problems  (both  with  and  without  multiple  arcs).  For  the  purpose 
of  this  section,  we  assume  familiarity  with  the  maximum  flow  algorithms  of  [Gol85,GT86]. 

3.1  Bipartite  Matching  Algorithm 

To  solve  a  bipartite  matching  problem,  we  transform  it  into  a  zero-one  network  flow  problem  in  a 
standard  way  (see  e.g.  [Law76]).  Given  a  bipartite  graph  with  a  node  set  we  direct  edges 

of  the  graph  from  nodes  in  S  to  nodes  in  T.  We  add  a  source  s  and  arcs  (s,  v )  for  all  v  £  5,  and  a 
sink  t  and  arcs  (w,  t)  for  all  tu  €  T.  We  define  all  arc  capacities  to  be  one.  The  resulting  maximum 
flow  problem  is  equivalent  to  the  original  bipartite  matching  problem.  We  call  a  network  that  can 
be  obtained  by  the  above  transformation  a  matching  network. 

Two  possible  approaches  to  design  of  parallel  algorithms  for  the  problem  of  finding  maximum 
flows  in  matching  networks  suggest  themselves.  One  approach  is  to  use  the  Ford- Fulkerson  aug¬ 
menting  path  algorithm  [FF62]  with  a  parallel  breadth-first  search  subroutine.  Another  approach 
is  to  use  a  parallel  implementation  of  the  Goldberg-Tarjan  method  [Gol85,GT86].  Both  approaches 
lead  to  superli near- time  algorithms,  but  for  different  reasons.  The  bottleneck  of  the  first  approach 
is  a  potentially  large  number  of  augmenting  paths;  the  bottleneck  of  the  second  approach  is  a 
potentially  large  number  of  node  relabelings. 

Our  algorithm  works  in  two  stages,  using  the  Goldberg-Tarjan  approach  in  the  first  stage  and 
the  Ford-Fulkerson  approach  in  the  second  stage.  A  proper  balancing  of  the  two  stages  leads  to  a 


I’ush(v,  tv). 

Applicability:  e j(v)  >  0,  Uf(v,  w)  >  0  and  </(?>)  =  d(w)  +  1. 

Action:  Send  6  —  miii(e y (v),  u/(v,  w))  units  of  How  from  v  to  w  as  follows: 

f(v,w)  «—  /( v,w)  +  £;  f(w,v)  --  /(«',(•)  -  <*>; 
ej(v)  —  ej(v)  —  6\  €/(w)  —  c/(tv)  -+  6. 


Uelabel(  i>). 

Applicability:  Any  v. 

Action:  d(v)  *—  min{cf(u>)  +  l|(t>,  u>)€  £.’/}. 

(If  this  minimum  is  over  an  empty  set,  </(»>)  •—  oo.) 

Figure  2:  Push  and  relabel  operations. 

sublinear  running  time.  In  the  context  of  sequential  algorithms  for  the  problem,  similar  balancing 
has  been  used  in  [A087,ET75]  to  obtain  ()( \/nm  j  time  bounds. 

Before  describing  the  algorithm,  we  need  to  introduce  a  few  terms.  For  more  detailed  definitions, 
see  [GT86].  A  pseudoflow  is  a  function  op  arcs  of  the  network  that  obeys  capacity  constraints. 
Given  a  pseudoflow  /  and  a  node  v ,  we  define  the  excess  at  v,  Cf(v),  to  be  the  difference  between 
the  incoming  and  the  outgoing  flows.  The  residual  capacity  of  an  arc  (v ,u>)  with  respect  to  a 
pseudoflow  /  is  equal  to  t  lie  capacity  of  (»\  w  1  minus  /( v,  to),  and  is  denoted  by  u/(e,  w).  Given  a 
pseudoflow  /,  we  denote  the  corresponding  residual  graph  by  Cj  =  ( V,  Ej).  ( Ej  is  the  set  of  arcs 
in  E  with  positive  residual  capacity.)  A  flow  is  a  pseudoflow  that  obeys  conservation  constraints, 
in  other  words  excesses  at  all  nodes  except  the  sink  and  the  source  are  zero.  A  (valid)  distance 
labeling  is  an  integer- valued  function  d  on  nodes  that  satisfies  c/(r)  <  d(w)  +  I  for  every  residual 
arc  (v,w).  We  say  that  a  node  v  £  ,S’(J7'  is  active  if  rj(v)  >  0  and  d( v)  <  k.  (Our  algorithm  works 
correctly  for  any  value  of  k  <  »,  but  the  best  running  time  bound  is  achieved  for  k  =  n 2/3.)  Given 
a  pseudoflow  /  and  a  distance  labeling  d,  we  define  an  admissible  graph  G(f,d)  =  (V,  E(f.d)) 
by  E(f,d)  =  {(f,  w)  £  Ef  | d( v )  =  d(w)  +  1}.  We  also  assume  familiarity  with  push  and  relabel 
operations  described  in  f  igure  2. 

Figure  3  describes  the  bipartite  matching  algorit  hm.  The  algorithm  consists  of  two  phases.  The 
first  phase  is  a  variation  of  the  Goldberg- Tarjuii  method.  This  phase  is  executed  as  long  as  the 
number  of  active  nodes  is  large,  and,  as  a  result,  many  distance  labels  increase  at  each  (parallel) 
time  step.  When  the  number  of  active  nodes  is  small,  excesses  from  these  nodes  are  returned  to 
the  source,  and  the  second  phase  begins.  In  this  phase,  the  algoiithui  finds  augmenting  paths  from 
the  source  to  the  sink  one- by  one.  The  second  stage  works  fast,  because  the  residual  flow  is  small. 

I  he  first  stage  of  the  algorithm  starts  bv  initializing  the  flow  to  zero,  setting  distance  labels  of 
nodes  in  S (J  I  (J { / }  to  zero,  and  setting  the  distance  label  of  the  source  to  n.  (Throughout  the 
algorithm,  distance  labels  of  source  and  sink  never  change:  d(s)  =  n,r/(/)  =  ().)  Then,  all  arcs  going 
out  of  the  source  are  saturated.  At  this  point,  all  nodes  of  .S'  have  excess  of  one.  After  the  above 


procedure  mntch(S,T , 

[initialization] 

transform  the  input  problem  into  network  flow  form; 

[first  stage] 

for  all  v  €  S[JT  do  d(v)  :=  0; 

d(t)  :=  0;  d(s)  :=  n; 

for  all  (u,u>)  £  E  do  f(v,w)  :=  0; 

for  all  v  €  S  do  f(s,  v)  :=  1; 

for  all  w  6  T(J{s,<}  do  ej(w)  :=  0; 

for  all  r  €  5  do  e/(t>)  :=  1; 

while  the  number  of  active  nodes  is  at  least  /  do  Match- and- Push;  return  all  excesses  to  the  source; 


[second  stage] 

while  there  is  an  augmenting  path  from  s  to  t  do 
find  an  augmenting  path  and  augment; 

return  the  matching  corresponding  to  the  current  (maximum)  flow; 


Figure  3:  High-level  description  of  the  bipartite  matching  algorithm.  For  the  algorithm  described  in  this 
paper,  we  take  /  =  n2/3. 
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initialization  is  complete,  the  Match-and-Push  procedure  is  executed  until  the  number  of  active 
nodes  is  less  than  /.  (As  we  shall  see,  the  best  running  time  is  achieved  for  l  =  (\/n£j).  Finally,  at 
the  end  of  the  first  stage,  the  flow  excesses  are  returned  to  the  source.  Namely,  the  excess  flow  is 
pushed  from  nodes  v  £  S  such  that  ey(r)  =  1  toj  along  (u,s). 

The  Match-and-Push  procedure,  shown  in  Figure  4,  is  the  key  to  the  first  stage  of  our  algorithm. 
The  following  lemma  states  the  properties  of  this  procedure  that  are  essential  for  the  analysis  of 
the  algorithm. 

Lemma  3.1 

The  Match-and-Push  procedure  maintains  the  following  invariants: 

1.  The  current  pseudoflow  f  is  integral. 

2.  Indegree  of  a  node  v  €  S  in  the  residual  graph  G j  is  1  —  e/(v). 

3.  For  every  node  v  £  5(JT,  e/(v)  £  (0, 1}. 

4-  On  entry  to  and  on  exit  from  Match-and-Push,  all  nodes  in  T  have  zero  excesses. 

Proof :  Integrality  of  /  follows  by  induction  on  the  number  of  the  push  operations.  The  second 
invariant  follows  from  the  properties  of  matching  networks. 

Invariant  3  holds  after  the  initialization  by  the  structure  of  a  matching  network.  Suppose  that 
the  invariant  bolds  before  an  execution  of  Match-and-Push.  Step  1  assures  that  it  holds  after  Step 
2.  The  relabeling  steps  3,  5,  and  6  cannot  affect  this  invariant.  Because  of  the  second  invariant, 
no  flow  can  be  pushed  to  an  active  node  and  at  most  one  unit  of  flow  can  be  pushed  to  an  inactive 
node  at  Step  4.  Thus  Step  4  preserves  the  invariant. 

Invariant  4  holds  because  after  Step  3,  every  node  in  T  has  excess  of  either  zero  or  one,  and 
because  of  the  relabeling  done  at  Step  2  every  node  with  excess  of  one  has  an  outgoing  admissible 
arc  that  can  be  used  to  push  the  excess  from  the  node  in  Step  4.  After  Step  4  all  nodes  in  T  have 
zero  excesses.  The  remaining  steps  do  not  change  the  pseudoflow,  and  therefore  invariant  2  holds 
at  the  end  of  Match-and-Push.  | 

The  above  lemma  implies  that  the  last  step  of  the  first  phase,  namely  returning  flow  from  the 
nodes  with  excess  to  the  source,  is  easy.  More  precisely,  Invariants  1  and  2  imply  that  nodes  in  T 
have  no  excess  flow,  and  each  node  in  S  has  at  most  one  unit  of  excess.  Furthermore,  since  k  <  n, 
no  flow  is  pushed  from  a  node  v  £  S  to  s  by  the  previous  part  of  the  algorithm,  and  therefore  for 
all  v  £  5,  the  residual  capacities  of  arcs  (u,s)  are  equal  to  one.  Thus,  the  excess  flow  can  be  pushed 
from  nodes  v  £  S  such  that  e/(v)  —  1  to  s  along  {v,s).  Note  that  these  pushes  are  nonstandard. 
i.e.,  they  do  not  preserve  the  validity  of  d.  This  is  not  a  problem,  however,  because  we  do  not  use 
the  distance  labels  after  the  last  execution  of  Match-and-Push. 

We  start  our  analysis  of  the  algorithm  by  bounding  the  running  time  of  the  Match-and-Push 
procedure. 


Figure  4:  The  Match-and-Push  procedure 

Lemma  3.2  Procedure  Match-and-Pusli  runs  in  0(log3n)  time  on  a  CRCW  PRAM  with  n  +  m 
processors. 

Proof :  Steps  2-6  can  be  implemented  so  that  each  step  takes  Of  log  n)  time  on  a  CRCW  PRAM 
with  n  +  m  processors  [Gol87,SV82b].  The  bottleneck  is  Step  1,  which  takes  0(log3n)  time  on  a 
CRCW  PRAM  using  n  4-  m  processors  [ISSG].  | 

Next  we  bound  the  number  of  execution  ^  o!  Match-and-Push. 

Lemma  3.3  Procedure  Match-and-Push  is  executed  at  most  -^+1^  +  1  times. 

Proof:  We  call  a  relabeling  of  a  node  v  significant  if  the  relabeling  increases  d(v)  and  before  tv,“ 
relabeling  d(v)  <  k.  We  show  that  at  all  except  the  last  execution  of  Match-and-Push,  there  are  .  \ 
least  l  significant  relabeling.  Since  the  distance  labels  never  decrease,  the  total  number  of  signihcan 
relabelings  is  at  most  n(k  -f  1),  and  the  desired  bound  follows. 

We  claim  that  a  significant  relabeling  of  each  of  the  following  nodes  occurs  during  an  execution 
of  Match-and-Push: 

1.  The  nodes  in  T  which  are  matched  in  Step  1. 

2.  The  active  nodes  in  S  which  are  not  matched  in  Step  1. 

Note  that  in  all  except  maybe  the  last  execution  of  Match-and-Push,  the  number  of  nodes  satisfying 
the  above  two  conditions  is  at  least  l,  so  establishing  this  claim  completes  the  proof  of  the  theorem. 

Suppose  a  node  w  E  T  is  matched  to  a  node  v  €  5  at  Step  1;  this  implies  that  d(v)  <  k  and 
d(w)  =  d(v)  —  1  <  k.  We  show  that  d(w)  increases  either  at  Step  3  or  at  Step  5.  If  d(w)  increases 
at  Step  3,  we  are  done.  Otherwise,  after  Step  4,  the  only  residual  arc  out  of  w  is  (tc,v).  At  Step 
1,  the  arc  (v,w)  is  admissible  and  therefore  d(v)  —  d(w)  +  1.  Node  v  has  not  been  relabeled  since 
then,  so  d(v)  did  not  change;  by  the  above  assumption,  neither  had  d(w).  By  the  definition  of  the 
relabeling  operation,  at.  Step  5  the  distance  label  of  w  becomes  d(v)  +  1,  i.e.,  the  distance  label 
increases  by  two. 
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Now  consider  an  active  node  v  6  S  that  is  not  matched  at  Step  1.  By  definition  of  an  active 
node,  d(v)  <  k\  we  show  that  d(v)  increases  at  Step  6.  Note  that  the  arc  (u,  s)  cannot  be  admissible, 
because  d(v)  <  k  <  n.  Therefore  at  Step  1  all  admissible  neighbors  of  v  lie  in  T.  These  neighbors 
are  matched  at  Step  1,  and  therefore  by  Step  6  their  distance  labels  must  increase  (by  the  argument 
above).  Since  v  has  not  acquired  any  new  residual  neighbors,  its  distance  label  must  increase  at 
Step  6.  | 

The  above  two  lemmas,  combined  with  an  observation  that  the  initialization  of  the  first  stage 
can  be  done  in  constant  time  using  n  +  m  processors,  imply  the  following  result. 

Lemma  3.4  The  first  stage  of  the  bipartite  matching  algorithm  runs  in  0(  ^  log3  n)  time  using 
n  +  m  processors. 

To  complete  the  analysis  of  the  algorithm,  we  need  the  following  lemma,  which  is  similar  to  a 
lemma  in  [ET75]. 

Lemma  3.5  After  the  first  stage  of  the  algorithm,  the  value  of  the  residual  flow  is  at  most  n/k  +  /. 

Proof :  We  show  that  the  amount  of  flow  that  can  reach  the  sink  after  the  last  execution  of  Match- 
and-Push  is  at  most  n/k  +  l.  Since  returning  excesses  to  the  source  does  not  affect  this  amount, 
the  above  claim  implies  the  theorem. 

Consider  the  pseudoflow  /  and  the  distance  labeling  d  just  after  the  last  execution  of  Malch- 
and-Push,  and  let  /  be  an  optimal  flow.  Consider  the  set  of  arcs  A  —  {(t,i)|/(i,  j)  >  /(t,j)}.  Note 
that  A  C  Ef.  Arcs  in  A  can  be  partitioned  into  a  collection  of  simple  paths  from  nodes  with  excess 
to  t,  a  collection  of  simple  paths  from  nodes  with  excess  to  s,  and  a  collection  of  cycles.  By  the 
properties  of  matching  networks  [ET75],  these  paths  and  cycles  are  node-disjoint. 

We  need  to  show  that  the  number  of  paths  in  the  first  collection  is  at  most  l  -f  n/k.  Consider 
a  residual  path  from  a  node  v  to  t.  Since  d(v)  is  a  lower  bound  on  the  distance  from  v  to  t  in  G/, 
the  length  of  such  a  path  is  at  least  d(u).  Thus,  at  most  l  paths  in  the  first  collection  have  length 
of  k  or  less.  The  remaining  paths  have  length  of  greater  then  k,  and  the  number  of  such  paths  is 
at  most  n/k ,  because  the  paths  are  node-disjoint.  | 

In  the  second  phase  of  the  algorithm,  we  find  augmenting  paths  one-by-one.  By  Lemma  3.5, 
we  have  the  following  bound  on  the  running  time  of  this  stage. 

Lemma  3.6  The  second  stage  of  the  bipartite  matching  algorithm  runs  in  0((£  +  l)  log3  n)  time 
using  BFS(n,m)  processors. 

Remark:  In  the  second  stage,  the  algorithm  can  use  any  augmenting  path.  However,  the  fastest 
current  parallel  algorithm  finds  a  shortest  augmenting  path. 

The  running  time  bound  for  the  algorithm  is  as  follows: 


Theorem  3.7  The  bipartite  matching  algorithm  runs  in  0( n2^3log3  n)  time  using  BFS(n,m)  pro¬ 
cessors. 

Proof :  Set  k  =  n1/3  and  /  =  n2^3  and  apply  Lemmas  3.4  and  3.6.  | 

3.2  Zero-One  Flow  Algorithms 

In  this  section  we  describe  algorithms  for  computing  maximum  flows  in  networks  with  unit  arc 
capacities.  We  describe  two  algorithms,  one  optimized  for  general  zero-one  networks  and  another 
optimized  for  networks  with  no  multiple  arcs.  The  bound  achieved  by  the  algorithm  for  general 
zero-one  networks  can  also  be  obtained  by  transforming  the  input  network  into  a  matching  network 
and  applying  the  bipartite  matching  algorithm  described  above.  However,  the  method  we  describe 
in  this  section  leads  to  better  bounds  for  networks  with  no  multiple  arcs. 

First  we  describe  the  algorithm  that  finds  a  maximum  flow  in  a  general  zero-one  network 
(V,E,s,t)  with  unit  arc  capacities  (see  Figure  5).  At  a  high  level,  this  algorithm  is  similar  to 
the  algorithm  of  the  previous  section:  the  algorithm  consists  of  two  stages,  the  first  based  on  the 
Goldberg- Tarjan  method  and  the  second  based  on  the  Ford- Fulkerson  method.  The  balancing  of 
work  done  in  the  two  stages  is  similar  to  that  of  the  sequential  algorithms  of  [A087,ET75].  In 
addition,  the  zero-one  flow  algorithm  has  a  finish-up  stage  that  converts  a  pseudoflow  of  maximum 
value  into  a  flow  of  maximum  value.  (By  a  value  of  a  pseudoflow  we  mean  the  amount  flowing  into 
the  sink.) 

The  first  stage  initializes  by  setting  distance  label  of  a  to  n,  setting  all  other  distance  labels  to 
zero,  setting  flow  through  arcs  of  the  form  (a,  />)  to  one,  and  flow  through  all  other  arcs  to  zero. 
After  that,  the  Push-and- Relabel  procedure  is  iterated  until  the  total  amount  of  excess  at  active 
nodes  is  less  then  /  =  m2/3. 

The  Push-and- Relabel  procedure  is  describe  on  Figure  6.  The  use  of  parallel  prefix  computations 
in  steps  2  and  3  is  similar  to  the  use  of  these  operations  in  [Gol87].  (The  use  of  the  parallel  prefix 
computations  in  the  design  of  parallel  algorithms  is  discussed  in  [Ble86,LM86].) 

The  second  stage  of  the  algorithm  keeps  finding  augmenting  paths  from  a  node  v  &  {s,t}  with 
ef(v)  >  0  to  the  sink.  One  way  to  find  such  a  path  is  to  do  a  breadth-first  search  backwards 
from  the  sink  in  the  residual  graph.  When  no  such  paths  exist,  the  current  pseudoflow  fenj  is  of 
maximum  value. 

The  finish-up  stage  converts  fend  into  a  flow  /  by  returning  excesses  from  nodes  in  V  -  {s,  <} 
to  s.  This  conversion  is  done  by  running  the  same  algorithm  on  a  modified  network.  The  modified 
network  is  obtained  from  G/end  by  adding  a  new  source  s'  and  new  arcs  of  capacity  one  connecting 
s'  to  nodes  in  V  -  {s,t}  that  have  excesses  with  respect  to  fend-  If  a  node  v  has  excess  ejend(v), 
then  c/end(v)  arcs  of  the  form  (s',v)  arc  added.  The  source  of  the  original  network  is  the  sink 
of  the  modified  network.  It  can  be  easily  shown  (see  [Gol85,GT86])  that  when  the  zero-one  flow 


procedure  zero-one(V,  E,a,t); 


[first  stage] 

for  all  v  €  V  —  {«}  do  d(v )  :=  0; 
d(s)  :=  n; 

for  all  €  E  do  f(v,w)  ;=  0; 

for  all  v  G  V  do  e/(t>)  :=  0; 
for  all  v  G  V  such  that  (s,  t>)  e  E  do  begin 
/(«, »;)  :=  1;  t,(v)  :=  e,( v)  +  1; 

end; 

while  the  total  amount  of  excess  at  active  nodes  is  at  least  l  do  Puah-and- Relabel; 

[second  stage] 

while  there  is  an  augmenting  path  from  a  node  v  6  V  —  {«,t}  such  that  e(v)  >  0  to  I  do 
find  an  augmenting  path  from  v  to  t  and  augment; 

[finish-up  stage] 

if  the  current  pseudoflow  f,„d  ««  not  a  flow,  convert  it  into  a  flow  by  recursively  calling  zero- one; 
return  (J); 

end. 


Figure  5:  High-level  description  of  the  zero-one  flow  algorithm.  For  general  zero-one  networks,  take 
/  =  m2/3;  for  zero-one  networks  with  no  multiple  arcs,  take  l  =  min(m2/3,  (nm)2/5). 


! 


Step  1.  For  all  active  nodes  v,  sort  residual  arcs  (v,  w)  by  the  distance  label  of  w. 

Step  2.  For  all  active  nodes  v,  use  a  parallel  prefix  computation  on  the  list  of  outgoing  arcs  to  distribute 
e/(v)  among  residual  neighbors  of  v,  preferring  residual  neighbors  with  smaller  distance  labels. 

Step  3.  For  all  nodes  v,  use  parallel  prefix  computation  on  the  list  of  incoming  arcs  to  compute  new  excess 
e/(v)  by  adding  up  flow  pushed  to  v  during  step  2. 

Step  4.  Relabel  all  nodes  v  ^  s,t. _ _ _ 

Figure  6:  The  Push-and- Relabel  procedure. 


algorithm  is  applied  to  the  modified  network,  its  second  stage  terminates  with  a  flow  (rather  then 
a  pseudoflow). 

The  correctness  of  the  algorithm  follows  from  [GT86].  Performance  of  the  Push-and- Relabel 
procedure  is  summarized  by  the  following  lemma. 

Lemma  3.8  Procedure  Push-and- Relabel  runs  in  O(logn)  time  using  m  processors. 

Proof:  Cole’s  sorting  algorithm  [C0I86]  implements  Step  1  in  the  desired  resource  bounds.  The 
bounds  on  steps  2,3,  and  4  follow  from  [Gol87,SV82a].  | 

The  next  lemma  bounds  the  number  of  times  Push-and- Relabel  is  applied. 

Lemma  3.9  Procedure  Push-and-Relabel  is  executed  at  most  tzih+11  times. 

Proof :  During  the  first  stage  of  the  algorithm,  distance  label  of  a  node  increases  at  most  k  +  1 
times.  Since  each  push  in  a  zero-one  network  is  a  saturating  push,  the  total  number  of  pushes  is  at 
most  m(k  +  1)  [GT86].  At  each  execution  of  Push-and-Relabel  there  are  at  least  l  units  of  excess 
at  active  nodes,  and  therefore  at  least  /  pushes  are  performed.  | 

The  above  two  lemmas,  combined  with  an  observation  that  the  initialization  of  the  first  stage 
can  be  done  in  constant  time  using  n  +  m  processors,  imply  the  following  result. 

Lemma  3.10  The  first  stage  of  the  zero-one  flow  algorithm  runs  in  0(!Zp  logn)  time  using  n  +  m 
processors. 

In  the  second  stage  we  find  augmenting  paths  one-by-one.  To  bound  the  running  time  of  the 
stage,  we  first  bound  the  value  of  the  residual  flow  after  the  execution  of  the  first  stage.  The 
following  lemma  is  similar  to  Lemma  3.3. 

Lemma  3.11  After  the  first  stage  of  the  algorithm  is  applied  to  a  general  zero-one  network,  the 
value  of  the  residual  flow  is  at  most  m/k  +  I. 

Proof:  We  show  that  the  amount  of  flow  that  can  reach  the  sink  after  the  last  execution  of  Push- 
and-Relabel  is  at  most  m/k  +  l.  Since  returning  excesses  to  the  source  does  not  affect  this  amount, 
the  above  claim  implies  the  theorem. 

Consider  the  pseudoflow  /  and  the  distance  labeling  d  just  after  the  last  execution  of  Match- 
and-Push ,  and  let  /  be  an  optimal  flow.  Consider  the  set  of  arcs  A  =  {(*,  j)|/(*,i)  >  /(*,  j)}.  Note 
that  A  C  Ef.  Arcs  in  A  can  be  partitioned  into  a  collection  of  simple  paths  from  nodes  with  excess 
to  t,  a  collection  of  simple  paths  from  nodes  with  excess  to  s,  and  a  collection  of  cycles.  Since  we 
have  a  zero-one  network,  these  paths  and  cycles  are  arc-disjoint. 

We  need  to  show  that  the  number  of  paths  in  the  first  collection  of  paths  is  at  most  l  +  m/k. 
Consider  a  residual  path  from  a  node  v  to  t.  Since  d(v)  is  a  lower  bound  on  the  distance  from  v  to 


t  in  G/ ,  the  length  of  such  a  path  is  at  least  d(v).  Thus  at  most  /  paths  in  the  first  collection  have 
length  of  k  or  less.  The  remaining  paths  have  length  of  greater  then  k ,  and  the  number  of  such 
paths  is  at  most  m/k,  because  the  paths  are  arc-disjoint.  | 

Lemma  3.12  The  second  stage  of  the  zero-one  flow  algorithm  runs  in  0((j-  +  1)  log2  n)  time  using 
BFS(n,m)  processors. 

Proof:  The  lemma  follows  from  Lemma  3.11.  I 

The  running  time  bound  for  the  algorithm  on  general  zero-one  networks  is  as  follows: 

Theorem  3.13  On  general  zero-one  networks,  the  zero-one  flow  algorithm  runs  in  0(m2^ log2  n) 
time  using  BFS(n,m)  processors. 

Proof:  Set  k  =  m1/3  and  /  =  m2/3.  Lemmas  3.10  and  3.12  imply  that  the  first  two  stages  of  the 
algorithm  run  in  the  desired  resource  bound.  These  lemmas  also  imply  that  the  finish-up  stage 
runs  in  the  same  resource  bounds.  | 

Now  we  consider  the  problem  of  finding  maximum  flows  in  zero-one  networks  with  no  multiple 
arcs.  In  this  case,  we  can  improve  the  time  bound  of  Theorem  3.1 3  for  dense  graphs  (more  precisely, 
for  m  >  n3/2).  The  following  lemma,  similar  to  a  lemma  in  [ET75],  is  a  key  to  the  improvement. 

Lemma  3.14  After  the  first  stage  of  the  algorithm  is  applied  to  a  zero-one  network  with  no  multiple 
arcs,  the  value  of  the  residual  flow  is  at  most  (3^)  +  /. 

Proof:  We  show  that  the  amount  of  flow  that  can  reach  the  sink  after  the  last  execution  of  Push- 
and-Relabelis  at  most  +/.  Since  returning  excesses  to  the  source  does  not  affect  this  amount, 

the  above  claim  implies  the  theorem. 

Consider  the  pseudoflow  /  and  the  distance  labeling  d  just  after  the  last  execution  of  Match- 
and-Push,  and  let  /  be  an  optimal  flow.  Consider  the  set  of  arcs  A  =  {(i,  j)j/(t,  j)  >  /(*',  j)}.  Note 
that  A  C  Ef.  Arcs  in  A  can  be  partitioned  into  a  collection  of  simple  paths  from  nodes  with  excess 
to  t,  a  collection  of  simple  paths  from  nodes  with  excess  to  s,  and  a  collection  of  cycles.  Since  we 
have  a  zero-one  network,  these  paths  and  cycles  are  arc-disjoint.  The  number  of  paths  in  the  first 
collection  that  start  at  a  node  with  a  distance  label  of  k  or  less  is  at  most  /. 

To  complete  the  proof,  we  need  to  show  that  the  number  of  paths  that  start  at  a  node  with  a 
distance  label  greater  then  k  and  reach  the  sink  is  at  most  Suppose  for  contradiction  that 

this  is  false.  Let  P  be  a  set  of  these  paths,  and  let  G'  =  ( V,  E')  be  a  graph  induced  by  arcs  on 
paths  in  P.  Let  d'{v)  be  the  distance  in  G'  from  v  to  t.  Let  Vi  =  {v  G  V\d'(v)  =  t).  By  definition 


of  P ,  no  path  in  P  starts  at  a  node  in  the  set  VolJ  Vi  (J. .  .(J  V*.  Therefore  |P|  is  bounded,  for 
0  <  j  <  k  —  1,  by  the  number  of  arcs  in  the  set  Ef^Vj  x  which  is  at  most  |Vj|  x  |Vj+i| 

(since  the  network  has  no  multiple  arcs).  Our  assumption  implies  that  |Vj|  x  |V^+i|  >  >  and 

therefore  \V:\  +  |Vj+i|  >  for  0  <  j  <  k  -  1.  We  obtain  a  contradiction  as  follows: 


(Vo  -(-  Vi)  +  (Vj  +  V3)  +  .  . .  +  (V2[*/2J-1  +  ^2t*r/2j) 

2n  k-l 


> 

K—  A  W 

>  t22-  — 

>  n. 


Using  lemmas  3.11  and  3.14,  one  can  obtain  the  following  theorem;  the  proof  is  similar  to  the 
proof  of  Theorem  3.13. 


Theorem  3.15  On  zero-one  capacity  networks  with  no  multiple  arcs,  the  maximum  flow  algorithm 
runs  in  time  0(min(m3/3,  (nm)2/5)  log2  n)  on  a  CRCW  PRAM  using  BFS(n,m )  processors. 


4  The  Assignment  Problem  and  Minimum-Cost  Flows 

In  this  section  we  describe  parallel  algorithms  for  the  weighted  versions  of  the  problems  studied 
in  the  previous  section,  namely  the  assignment  problem  and  the  minimum-cost  flow  problem  with 
zero-one  capacities.  For  the  purpose  of  this  section,  we  assume  familiarity  with  the  minimum-cost 
flow  framework  of  [Gol87,GT87a,GT87b]. 

4.1  The  Assignment  Problem 

The  assignment  problem  is  a  weighted  version  of  the  bipartite  matching  problem.  Similarly  to  the 
unweighted  case  described  in  Section  3.1,  we  transform  the  assignment  problem  into  the  zero-one 
minimum-cost  flow  problem  in  the  standard  way  (see  e.g.,  [Law76])  where  the  weights  on  the  edges 
are  mapped  into  costs  on  the  arcs  of  the  transformed  network.  Without  loss  of  generality,  we 
assume  that  a  perfect  matching  exists.  To  assure  this  we  can  always  add  a  matching  with  arcs  of 
very  high  cost. 

In  order  to  describe  the  algorithm,  we  need  to  introduce  a  few  definitions  (see  [Gol87,GT87a] 
for  more  details).  Each  node  v  is  assigned  a  price  p(v).  Given  p,  the  r^dvccd  cost  of  an  arc  (v,w) 
is  defined  by  cp(v,  w)  =  p(v )  -  p(w)  +  c(v,  w),  where  c( v,  w )  is  the  original  cost  that  is  part  of  the 
input  to  the  problem.  Define  C  =  max{|c(u,u;)|  :  (v,w)  G  £}.  We  say  that  a  pseudoflow  is 
c- optimal  if  there  are  no  residual  arcs  with  reduced  cost  below  — e.  Given  a  pseudoflow  /  and  a 
price  function  p,  an  arc  (v,  w)  E  E  is  admissible  if  it  is  a  residual  arc  with  negative  reduced  cost, 


procedure  Assignments,  T,  E ); 

[Initialization] 

transform  the  input  problem  into  network  flow  form; 

v '  :=suruWU<Oi 
£':=£U({*}x  5) U(r  x{0); 

C  :=  max{|c(t),  tc)|  :  (v,w)€£}; 

<  :=  C; 

for  all  v  €  V'  do  p(v)  :=  0; 
p(s)  =  -2  nC; 

while  f  >  1/n  do 

c  :=  e/2; 

( c,p,  f )  :=  Refine(V',E',c,p,t ); 

end; 

return  the  matching  corresponding  to  current  (maximum)  flow  /; 

end. 


Figure  7:  High-level  description  of  the  outer  (scaling)  loop  of  the  assignment  algorithm. 


i.e .,  if  Uf(v,  w)  >  0  and  cp(v,  w)  <  0.  Define  c(/),  the  cost  of  pseudoflow  /,  by 

53  c(v,w)f(v,w). 

(v,w)£E:f  (v,u>)>0 

In  the  case  of  zero-one  flows,  the  cost  of  a  pseudoflow  iB  equal  to  the  sum  of  the  costs  of  the 
saturated  arcs. 

The  outer  loop  of  the  minimum-weight  bipartite  matching  algorithm,  shown  in  Figure  7,  does 
generalized  cost-scaling  [Gol87,GT87a].  Initially  c  =  C.  The  algorithm  iteratively  halves  e  and 
uses  the  Refine  procedure  to  update  the  flow  to  be  f-optimal  again.  It  can  be  shown  [Ber86]  that 
c-optimal  flow  is  optimal  for  e  <  1/n,  and  therefore  we  have  the  following  lemma. 

Lemma  4.1  [Gol87]  The  algorithm  terminates  and  produces  an  optimal  flow  after  O (log  nC)  calls 
to  the  Refine  procedure. 

The  heart  of  the  algorithm  is  the  Refine  procedure,  shown  in  Figure  8,  that  converts  an  2r- 
optimal  pseudoflow  into  an  c-optimal  flow.  The  procedure  starts  by  decreasing  the  prices  of  all  the 
nodes  in  T  by  2c.  (Though  somewhat  unnatural,  this  is  essential  for  the  proof  of  Lemma  4.3.)  Next 
it  constructs  an  c-optimal  pseudoflow  by  saturating  residual  arcs  with  reduced  cost  below  — c  and 
creating  appropriate  excesses  and  deficits  at  the  nodes. 

The  resulting  pseudoflow  is  converted  into  an  c-optimal  flow  by  procedure  Refine ,  that  consists 
of  two  stages.  The  first  stage  iteratively  uses  the  Match-and-Push  procedure  (see  Figure  4)  to  push 
the  positive  excesses  towards  the  negative  ones.  The  Match-and-Push  procedure  used  during  this 


procedure  Refine(V',  E',c,  p,  c); 

[Reduce  the  number  of  arcs  with  negative  reduced  cost.] 


for  all  v  e  T  do  p(v)  :=  p(v)  —  2c; 


[Convert  into  c-optimal  pseudoflow.] 


for  all  v  €  V'  do  e/(t/)  :=  0; 
for  all  (v,  w)  €  E'  do  f(v,  to)  :=  0; 
for  all  {(v,  to)  |  (v,  to)  €  E'  and  cp(v,  to)  <  — c)  do 
f(v,w)  :=  1; 
e/(tv)  :=  ej(tv)  4- 1; 
ej(v)  :=  ej{v)  -  1; 
end; 


[First  stage.] 


while  the  number  of  active  nodes  is  at  least  /  do  Matck-and-f  ush; 


[Second  stage] 


for  all  (v,  to)  g  E  do  lcngth(v,  to)  :=  c(v,  to)  +  c; 
while  there  are  active  nodes  do  begin 

let  r  be  a  shortest  path  w.r.t.  length  from  an  active  node  to  t; 
augment  along  T; 

end; 


return  (c,p,/); 


Figure  8:  High-level  description  of  the  inner  loop  of  the  assignment  algorithm.  For  the  algorithm 
described  in  this  paper,  we  take  /  =  n2f 3. 
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Push( v,  u>). 

Applicability:  e(v)  >  0,  uj(v,  w)  >  0  and  cp(v,  w)  =  p(v)  -  p(w)  +  c(v,  w)  <  0. 
Action:  Send  S  =  min(e/(v),u/(v,  tv))  units  of  flow  from  v  to  w  as  follows: 

f(v,  w)  «-  f(v,  w)  +  S;  f(w,  v)  <-  f(w,  v)  -  6; 
ej(v)  —  e,(v)  -  6;  ej(w)  *-  tj(w)  +  6. 


Relabel(v). 

Applicability:  Any  v. 

Action:  p(v)  <—  max{p(tv)  —  c(v,  tv)  —  e |(v,  w)  e  £/}. 

(If  this  maximum  is  over  an  empty  set,  p(v)  < - oo.) 

Figure  9:  Push  and  relabel  operations  for  minimum-cost  flow  computation. 

stage  is  exactly  the  same  as  for  the  unweighted  case  (see  Figure  4),  except  that  Push  and  Relabel 
are  generalized  to  the  weighted  case  as  described  in  Figure  9.  We  say  that  a  node  v  is  active  if 
ej(v)  >  0  and  the  price  change  of  this  node  during  the  current  invocation  of  Refine  is  below  ke. 
The  first  stage  is  executed  as  long  as  the  number  of  active  nodes  exceeds  l.  (The  best  running  time 
is  achieved  for  /  =  n2/3  and  k  =  n1/3.) 

The  second  stage  of  Refine  finds  augmenting  paths  from  nodes  with  excess  to  the  sink  and 
augments  along  these  paths.  The  paths-used  for  the  augmentations  are  shortest  paths  with  respect 
to  the  distance  function  l  obtained  by  adding  e  to  costs.  Since  we  have  assumed  that  the  input 
graph  has  a  perfect  matching.  Refine  terminates  with  a  flow.  The  following  lemma  shows  that  the 
second  stage  preserves  e-optimality. 

Lemma  4.2  Suppose  a  pseudoflow  f  is  e-optimal,  and  let  l  :  E  -»  R  be  defined  by  l(v,  w)  = 
c(v,  w)  -f  e.  Let  r  be  a  shortest  path  with  respect  to  l  in  Gf.  Than  changing  flow  along  T  preserves 
e- optimality . 

Proof:  Straight-forward,  given  the  results  of  [GT87a].  | 

The  following  lemma  is  needed  to  bound  the  residual  flow  after  the  first  stage. 

Lemma  4.3  After  the  first  stage  of  Refine,  the  value  of  the  residual  flow  is  at  most  0(n/k  +  l). 

Proof  :  Consider  nodes  other  then  s  and  t  that  have  excesses  at  the  end  of  the  first  stage.  We  show 
that  sum  of  price  decreases  at  these  nodes  during  the  first  stage  is  bounded  by  O(nc).  At  the  end 
of  the  first  stage  there  can  be  at  most  l  nodes  with  excesses  such  that  their  price  was  decreased  by 
less  then  ke  during  the  first  stage.  Each  excess  has  a  value  of  one,  and  therefore  the  total  amount 
of  excess  at  the  end  of  the  stage  is  0(n/k  +  /).  This  fact  gives  the  desired  bound  on  the  value  of 
the  residual  flow. 
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The  first  step  decreases  the  prices  of  all  the  nodes  in  T  by  2e.  This  increases  the  reduced  prices 
of  arcs  that  go  into  nodes  in  T  by  2c.  The  input  is  2c-optimal,  and  therefore  after  this  increase  all 
residual  arcs  that  go  into  nodes  in  T  have  positive  cost.  Hence  the  number  of  remaining  residual 
arcs  with  negative  reduced  cost  is  at  most  n.  On  the  other  hand,  the  residual  cost  of  arcs  going 
out  of  nodes  in  T  is  decreased  by  2c,  and  therefore  the  flow  is  4c-optimal  with  respect  to  the  new 
prices. 

Without  loss  of  generality,  we  can  assume  that  at  this  point  all  costs  are  replaced  by  the  residual 
costs,  and  all  prices  are  set  to  zero.  We  call  the  resulting  costs  transformed. 

Next  we  give  lower  and  upper  bounds  on  the  cost  of  an  c-optimal  flow  in  a  network  with 
transformed  costs.  The  transformed  arc  costs  are  at  least  —4c.  Moreover,  there  are  at  most  n 
negative-cost  arcs.  Therefore,  for  any  pseudoflow  /  in  this  network,  we  have 

cost (/)  >  —  4ne. 

To  bound  the  cost  of  any  c-optimal  flow  f,  from  above,  consider  the  decomposition  of  this  flow 
into  paths  from  s  to  t  and  cycles.  The  network  is  a  matching  network  and  therefore  these  paths 
and  cycles  are  node-disjoint.  The  prices  of  both  s  and  t  are  zero  because  they  are  never  relabeled, 
and  therefore  the  cost  of  ft  is  equal  to  the  sum  of  the  reduced  costs  of  the  saturated  arcs.  For 
any  saturated  arc  (i>,tu)  such  that  ft(v,w)  =  1,  there  is  a  residual  arc  (w,v)  6  Eft  with  cost 
c(w,v)  =  —c(v,w)  >  -c,  and  therefore  we  have 

cost(/e )  <  tic. 

Consider  a  pseudoflow  /'  at  some  point  of  the  execution  of  the  first  stage.  Define  the  set  of 
arcs  A'  =  {(»,j)|/((*,j)  >  /<(*,j)}.  Arcs  in  A'  can  be  partitioned  into  simple  paths  from  nodes 
with  excess  to  nodes  with  deficit,  simple  paths  from  s  to  t,  and  simple  cycles.  Note  that  A'  C  E/>, 
where  Ej>  is  the  set  of  residual  arcs  of  the  flow  /',  and  therefore  the  residual  cost  of  any  arc  in  A! 

is  at  least  — e.  Let  IPj  denote  the  length  of  a  path  P.  Then  the  cost  of  a  path  P  from  node  v  with 
excess  to  node  w  with  deficit  is 

cost(P)  =  c(«)  =  Y  cp(e)  +  Piw )  “  P(v )  ^  ~P(V)  ~  lpl( 

e€P  egP 

The  last  inequality  holds  because  nodes  with  deficit  are  not  relabeled.  By  the  properties  of  matching 
networks,  the  sum  of  the  lengths  of  the  paths  and  the  cycles  is  at  most  n.  The  cost  of  any  pseudoflow 
is  at  least  — 4ne  and  the  cost  of  fe  is  at  most  ne.  Hence,  the  sum  of  the  costs  of  the  paths  is  at 
most  5ne.  Taking  into  account  that  the  cost  of  a  cycle  is  at  least  the  length  of  the  cycle  times  — €, 
we  have 

-  Y  p^)  -  6ne 

e(v)>0 

Hence,  the  number  of  nodes  that  were  relabeled  by  k(  is  at  most  6n/k ,  and  the  bound  follows. 
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The  above  lemma  bounds  the  number  of  iterations  of  the  second  stage  of  Refine.  Next  we  bound 
the  number  of  iterations  of  the  first  stage. 

Lemma  4.4  There  are  at  most  +  l  calls  to  Match-and-Push  in  the  first  stage  of  Refine. 

Proof :  The  proof  is  similar  to  the  proof  of  Lemma  3.3.  The  main  idea  is  that  as  long  as  there  are 
at  least  l  active  nodes,  prices  of  at  least  /  nodes  decrease  by  at  least  e  each  during  one  invocation 
of  Match-and-Push.  The  claim  then  follows  because  the  total  amount  of  relabeling  during  the  first 
stage  of  Refine  is  at  most  nke. 

Consider  an  active  node  v  €  S  that  was  matched  with  w  6  T  during  the  first  step  of  Match-and- 
Push.  If  v  pushes  to  w  and  w  pushes  to  some  node  v'  ^  v,  the  only  residual  arc  from  w  after  this 
push  is  (w,  t>).  But  the  arc  (v,u>)  was  admissible,  and  therefore  in  the  beginning  of  the  iteration 
cp(v,w)  =  p(v)  -  p(w)  +  c(v,w)  <  0.  By  definition  of  Relabel  (see  Figure  9),  w  is  relabeled  by 

Ap(iv)  =  p{u>)  -  c(w,v)  -  p(v)  +  c  >  (p(v)  +  c(w,  v))  —  p(v)  +  c(v,  w)  =  f. 

Similarly,  if  the  push  was  back  to  v,  it  is  easy  to  see  that  w  is  also  relabeled  by  at  least  c. 

Consider  a  node  v  that  is  not  matched  during  the  first  step.  From  the  previous  discussion  it 
follows  that  each  node  w  e  T,  such  that  (v,w)  is  a  residual  arc,  is  relabeled  by  at  least  e.  Hence, 
at  Step  6,  v  is  also  relabeled  by  at  least  e.  | 

Given  the  similarity  of  the  above  lemmas  with  the  corresponding  lemmas  for  the  unweighted 
case,  it  is  not  surprising  that  the  running  times  are  similar. 

Theorem  4.5  The  assignment  algorithm  runs  in  0(n2/3  log3  n  log(nC))  time  using  SS P(  n,  m) 
processors. 

Proof:  The  initialization  of  Refine  can  be  done  in  O(logn)  time  with  n  +  m  processors.  By 
Lemma  4.4,  the  first  stage  makes  + 1)  calls  to  Match-and-Push.  By  Lemma  3.2,  Match-and- 
Push  runs  in  O(log3n)  time  on  a  CRCW  PRAM  with  n  +  m  processors.  Hence,  by  Lemma  4.3, 
each  iteration  of  Refine  takes  0(n2/3log3  n)  time  on  a  CRCW  PRAM  using  SSP{n,m)  processors. 

By  Lemma  4.1,  after  0(log(nC))  iterations  of  Refine  the  resulting  flow  is  optimal.  The  claim 
follows  by  setting  k  —  nx^,l  =  n2/3.  | 

4.2  Zero-One  Minimum-Cost  Flows 

In  this  section  we  study  the  minimum-cost  flow  problems  with  unit  capacities  on  arcs.  To  solve  such 
a  problem  ((V,  E),c,s,  t),  we  use  a  classical  transformation  (see  e.g.  [CSV84,KUW86])  to  reduce 
this  problem  to  a  problem  of  finding  a  minimum-weight  bipartite  matching.  The  transformation 
is  as  follows.  First,  observe  that  there  is  a  one-to-one  correspondence  between  a  minimum-cost 


flow  in  G  and  a  maximum  set  of  node-disjoint  paths  of  minimum  weight  in  the  line-graph  G'  of 
G.  Split  each  node  v  in  G'  into  two  nodes  i>j  and  t>2,  connect  them  by  a  zero- weight  edge,  and 
connect  to  ui  for  each  arc  (u,  u)  in  G'.  Call  the  resulting  graph  G" .  It  is  easy  to  see  that 
minimum-weight  bipartite  matching  in  G"  corresponds  to  the  maximum  set  of  node-disjoint  paths 
of  minimum  weight  in  G\  and  therefore  solves  the  minimum-cost  flow  problem  for  G.  The  number 
of  nodes  in  G"  is  0(m),  which  leads  to  the  following  theorem: 

Theorem  4.8  The  Minimum-Cost  Flow  algorithm  for  general  zero-one  networks  runs  in  time 
0(m2/3  log3  nlog(nC))  on  a  CRCW  PRAM  using  SSP(n,m)  processors. 

Proof:  The  theorem  follows  from  the  observation  |V"|  =  2m  and  Theorem  4.5.  | 

Remark:  We  can  extend  the  result  of  the  theorem  to  networks  with  arbitrary  integral  capacities 
represented  in  unary  by  replacing  every  arc  (u,tc)  of  capacity  u(v,w)  and  cost  c(v,  w)  by  u(v,w) 
arcs  of  capacity  one  and  cost  c(v,  w). 

The  ideas  of  the  previous  section  can  be  extended  to  get  a  sublinear-time  algorithm  that  finds 
a  maximum  flow  of  minimum  cost  in  graphs  with  zero-one  capacities  without  using  the  above 
reduction.  The  resulting  algorithm  runs  in  0(m2^3 log2  nlog(nC))  time,  i.e.,  improves  the  running 
time  bound  of  Theorem  4.6  by  a  factor  of  logn.  However,  we  are  unable  to  obtain  better  bounds, 
suggested  by  Theorem  3.15,  for  the  case  of  graphs  with  no  multiple  arcs.  Since  the  algorithm  is 
obtained  by  a  straight-forward  combination  of  he  re  ults  of  this  section  and  Section  3.2,  we  omit 
the  details. 

Remark:  Note  that  our  algorithms  find  an  optimal  primal  solution  but  not  the  optimal  dual  so¬ 
lution.  Our  algorithm*  can  be  modified  to  compute  an  optimal  dual  solution  as  well,  without 
increasing  the  asymptotic  running  time  and  processor  bounds.  See  e.g.  [GT88]. 
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